Introduction to Deep Reinforcement Learning (DRL)
Deep Reinforcement Learning (DRL) merges the high-dimensional representation capabilities of Deep Neural Networks with the optimal control framework of Reinforcement Learning. Unlike supervised or unsupervised learning, DRL agents learn through trial-and-error interaction within a dynamic environment, making sequential decisions without immediate, explicit labels. This integration allows agents to handle complex, raw inputs (like pixel data) directly.
1. The DRL Learning Paradigm
The RL agent operates in a continuous loop: observing the environment State ($S_t$), performing an Action ($A_t$), and receiving a potentially sparse or delayed scalar Reward ($R_{t+1}$). The primary challenge is the credit assignment problem: determining which past actions were responsible for a future reward signal.
2. The Optimization Objective
The ultimate goal is to discover an optimal strategy, or policy ($\pi^*$), which is a mapping from states to actions, that maximizes the Expected Cumulative Discounted Return ($G_t$). The discount factor ($\gamma \in [0, 1]$) is mathematically crucial, defining how much we value immediate rewards versus rewards expected far into the future.
$$G_t = \sum_{k=0}^{\infty} \gamma^k R_{t+k+1}$$1. $\gamma = 0$
2. $\gamma \approx 1$
Describe the agent's behavioral preference in each case regarding the timeline of rewards.
If $\gamma = 0$, the agent is myopic (shortsighted), focusing only on the immediate reward $R_{t+1}$. If $\gamma \approx 1$, the agent is far-sighted, equally weighting immediate and distant future rewards, leading to planning over a very long horizon.